| Mean Hesitant (%) | SD Hesitant | Min Hesitant (%) | Max Hesitant (%) |
|---|---|---|---|
| 13.42069 | 4.785884 | 2.69 | 26.7 |
PM592:Data Analysis for Covid 19 Vaccine Hesitancy and possible demographic and geographic correlations
PM 592 Final Project
Link to full code on Github: https://github.com/nktang05/PM566Final/blob/main/PM592.qmd
Introduction
Introduction
COVID-19 vaccine hesitancy refers to the reluctance or refusal to get vaccinated despite the availability of vaccines. Vaccination plays a crucial role in controlling the pandemic by reducing the spread of the virus, preventing severe illness, and decreasing hospitalization and death rates. However, hesitancy has been influenced by factors such as misinformation, distrust in healthcare systems or government authorities, concerns about the speed of vaccine development, and fears about potential side effects. Social, cultural, and political contexts have also shaped people’s attitudes toward vaccines.
This data set has various demographic information showing information by county, state, ethnicity, COVID-19 vaccine coverage (CVAC) and social vulnerability index (SVI). In order to determine hesitancy levels, people were surveyed “Once a vaccine to prevent COVID-19 is available to you, would you…get a vaccine?” and the following options were: 1) “definitely get a vaccine”; 2) “probably get a vaccine”; 3) “unsure”; 4) “probably not get a vaccine”; 5) “definitely not get a vaccine”. his data set also looks into varying levels of hesitancy: hesitant, hesitant or unsure, or strongly hesitant. People who responded “probably not” or “definitely not” were categorized as hesitant.
Data set origin: https://data.cdc.gov/Vaccinations/Vaccine-Hesitancy-for-COVID-19-County-and-local-es/q9mh-h2tw/about_data
I also utilized a data set on educational attainment (Bachelors degree or higher) by state for the year 2021. Data set origin:https://fred.stlouisfed.org/release/tables?rid=330&eid=391444&od=2021-01-01#
I also utilized a data set on COVID-19 Mortality by state for 2021. Data set origin: https://www.cdc.gov/nchs/pressroom/sosmap/covid19_mortality_final/COVID19.htm
Research Question Are there any correlations between demographic, geographical, and social factors and the rates of vaccine hesitancy?
Identify a health-related outcome variable that you want to assess The heath variable I am going to assess is rates of vaccine hesitancy.
Identify 2 independent variables that may be associated with that health outcome The first independent variable is social vulnerability index (SVI). SVI is measure of how much a community is vulnerable based on things like socioeconomic status, minority status, and housing. Higher SVI scores may reflect structural barriers and lower trust in public health, potentially leading to higher vaccine hesitancy. The second independent variable is vaccine coverage index (CVAC). VAC measures supply and demand challenges to vaccine rollout based on healthcare accessibility barriers, sociodemographic barriers, and historic undervaccination. Higher CVAC scores indicate greater challenges in vaccine distribution, which may correlate with increased hesitancy. The third independent variable is predominant ethnicity. Cultural factors may influence vaccine hesitancy among specific groups.
Identify 1 independent variable that may be a confounder, and 1 independent variable that may be an effect modifier The first possible confounder is % Adults of fully vaccinated. It may be a confounder because higher vaccination rates could reduce hesitancy by normalizing it or reflecting better healthcare infrastructure. At the same time, vaccination rates may also reflect better healthcare infrastructure which could influence predictors like SVI and CVAC. The second possible confounder is % Bachelors degree or above. It may be a confounder because higher educational attainment is often associated with greater health literacy and lower vaccine hesitancy. Communities with higher education levels might have lower SVI scores and higher CVAC scores due to better socioeconomic conditions.
The first possible effect modifier is region. It may be an effect modifier because vaccine hesitancy patterns vary across regions due to cultural, political, and healthcare access differences. The second possible effect modifier is covid death rates per 100,000. It may be an effect modifier because higher death rates may increase the perceived threat of COVID-19, modifying the relationship between predictors like CVAC or SVI and vaccine hesitancy.
Methods
Data Cleaning and Wrangling A csv file downloaded to my files from the CDC website was read into a data frame. 280 observations with NA were removed to clean the data. One of the data columns held latitude and longitudinal information in the data type “Point”. So, I coded two new variable columns for latitude and longitude so it is in a more usable form for future visualizations. I also added a column for region (Northeast, Midwest, South, West) based on the state. The regions were picked based on Census Bureau designated regions. I also utilized a data set on educational attainment (Bachelors degree or higher) by state for the year 2021 to test for possible correlations with education level.I also utilized a data set on COVID-19 Mortality by state for 2021 from the CDC to test for possible correlations with covid death rates. The following variables are continuous: Average percent of adults fully vaccinated, Average percent of adults with a bachelors degree or higher, Average covid death rate per 100,000, Average SVI, and Average CVAC. Region and predominant ethnicity are categorical.
Aggregate Hesitancy Rates by Ethnicity The data set has columns for each ethnicity and the percentage of that ethnicity in the region. I made a new categorical variable column that’s value is the predominant ethnicity of that location.
Section 1: Preliminary Analysis
Basic preliminary analysis of hesitancy indicates a mean hesitancy at 13.42% with a standard deviation of 4.7% and minimun of 2.69% and maximum of 26.7%
| N | Mean_Percent_Adults_Fully_Vaccinated | SD_Percent_Adults_Fully_Vaccinated | Mean_Percent_Bachelors_Degree | SD_Percent_Bachelors_Degree | Mean_COVID_Death_Rate | SD_COVID_Death_Rate | Mean_svi | sd_svi | Mean_cvac | sd_cvac |
|---|---|---|---|---|---|---|---|---|---|---|
| 2862 | 39.93113 | 14.28988 | 33.09312 | 5.015483 | 101.1049 | 35.60613 | 0.4838959 | 0.2880657 | 0.46587 | 0.275621 |
| Category | Count | Percentage |
|---|---|---|
| non-Hispanic White | 2646 | 92.4528302 |
| non-Hispanic Black | 128 | 4.4723969 |
| non-Hispanic American Indian/Alaska Native | 33 | 1.1530398 |
| non-Hispanic Asian | 2 | 0.0698812 |
| Hispanic | 53 | 1.8518519 |
| Category | Count | Percentage |
|---|---|---|
| Midwest | 1055 | 36.862334 |
| Northeast | 218 | 7.617051 |
| South | 1185 | 41.404612 |
| West | 404 | 14.116003 |
Preliminary analysis of the variables: Average percent of adults fully vaccinated:36.93%
Average percent of adults with a bachelors degree or higher: 33.09
Average covid death rate per 100,000: 101.10
Average SVI: .48
Average CVAC: .47
% of areas with the predominant ethnicity being: non-Hispanic White: 92.45%
non-Hispanic Black: 4.47%
non-Hispanic American Indian/Alaska Native: 1.15%
non-Hispanic Asian: 0.069%
Hispanic: 1.85%
% of reporting locations in the following regions: Midwest: 36.86%
Northeast: 7.62%
South: 41.40%
West: 14.12%
Section 2: Simply X, Y relationship
| Model | Term | Coefficient | Standard Error | P-Value | Lower 95% CI | Upper 95% CI |
|---|---|---|---|---|---|---|
| Numeric Predictors | ||||||
| Numeric: Social Vulnerability Index | (Intercept) | 10.8751776 | 0.1659400 | 0.0000000 | 10.5498035 | 11.2005517 |
| Numeric: Social Vulnerability Index | `Social Vulnerability Index (SVI)` | 5.2604513 | 0.2946778 | 0.0000000 | 4.6826489 | 5.8382537 |
| Numeric: CVAC Level of Concern | (Intercept) | 10.1086408 | 0.1603391 | 0.0000000 | 9.7942490 | 10.4230326 |
| Numeric: CVAC Level of Concern | `CVAC level of concern for vaccination rollout` | 7.1093812 | 0.2962265 | 0.0000000 | 6.5285421 | 7.6902202 |
| Numeric: % Adults Fully Vaccinated | (Intercept) | 16.0794578 | 0.2602938 | 0.0000000 | 15.5690753 | 16.5898403 |
| Numeric: % Adults Fully Vaccinated | `Percent adults fully vaccinated against COVID-19 (as of 6/10/21)` | -0.0665839 | 0.0061375 | 0.0000000 | -0.0786183 | -0.0545494 |
| Numeric: % Bachelors Degree or Above | (Intercept) | 34.2610010 | 0.4486915 | 0.0000000 | 33.3812095 | 35.1407925 |
| Numeric: % Bachelors Degree or Above | education | -0.6297477 | 0.0134054 | 0.0000000 | -0.6560329 | -0.6034624 |
| Numeric: COVID Death Rate per 100,000 | (Intercept) | 13.4671389 | 0.2681428 | 0.0000000 | 12.9413643 | 13.9929135 |
| Numeric: COVID Death Rate per 100,000 | `covidDeathRateper100,000` | -0.0011289 | 0.0025016 | 0.6518146 | -0.0060340 | 0.0037762 |
| Categorical Predictors: Ethnicity | ||||||
| Categorical: Predominant Ethnicity | (Intercept) | 13.3483900 | 0.0913358 | 0.0000000 | 13.1692993 | 13.5274807 |
| Categorical: Predominant Ethnicity | Predominant_Ethnicitynon-Hispanic Black | 2.1855162 | 0.4251959 | 0.0000003 | 1.3517943 | 3.0192382 |
| Categorical: Predominant Ethnicity | Predominant_Ethnicitynon-Hispanic American Indian/Alaska Native | 4.9482766 | 0.8229439 | 0.0000000 | 3.3346525 | 6.5619007 |
| Categorical: Predominant Ethnicity | Predominant_Ethnicitynon-Hispanic Asian | 0.5216100 | 3.3234172 | 0.8752954 | -5.9949287 | 7.0381487 |
| Categorical: Predominant Ethnicity | Predominant_EthnicityHispanic | -4.4748051 | 0.6517850 | 0.0000000 | -5.7528217 | -3.1967885 |
| Categorical Predictors: Region | ||||||
| Categorical: Region | (Intercept) | 12.2471090 | 0.1331381 | 0.0000000 | 11.9860527 | 12.5081653 |
| Categorical: Region | RegionNortheast | -3.9495402 | 0.3217275 | 0.0000000 | -4.5803816 | -3.3186988 |
| Categorical: Region | RegionSouth | 3.2073804 | 0.1830489 | 0.0000000 | 2.8484593 | 3.5663016 |
| Categorical: Region | RegionWest | 1.0372227 | 0.2530109 | 0.0000426 | 0.5411204 | 1.5333249 |
Simple regression anlaysis indicated that SVI was statistically significant with a p-value <.001 and a coefficient of 5.26, indicating for every unit increase in SVI, hesitancy percentage is expected to increase by 5.26. CVAC was statistically significant with a p-value <.001 and a coefficient of 7.11, indicating for every unit increase in CVSC, hesitancy percentage is expected to increase by 7.11. Percent of adults fully vaccinated was statistically significant with a p-value <.001 and a coefficient of 0.07, indicating for every unit increase in SVI, hesitancy percentage is expected to decrease by 0.07. Education was statistically significant with a p-value <.001 and a coefficient of 0.63, indicating for every unit increase in percent educated, hesitancy percentage is expected to decrease by 0.63. Covid deaths were not statistically significant indicating that they have no direct correlation with hesitancy levels. When looking at ethnicity with non hispanic white being the baseline, all other ethnicity groups are more likely to be hesitant except Hispanic. With the midwest as the baseline, all other regions are more likely to be more hesitant except the northeast which would be less hesitant.
Section 2: Check for confounders
I checked if percent adults fully vaccinated and education percentage were confounders on the independent variables, SVI, CVAC, and predominant ethnicity. From running the different models, percent adults fully vaccinated does not appear to be a confounder because the coefficients of the independent variables are similar to the coefficients in the model ran with this possible confounder. However, education appears to be a possible confounders. The initial model has the SVI coefficient at 1.40 and when adjusted for education is .60. It is important to note that in the model adjusting for education the SVI is no longer statistically significant. The initial model has the CVAC coefficient at 6.38 and when adjusted for education is 3.58 and is statisticallly significant. The model with education also has a R-squared of .50, higher than the original model with R-squared at .20, indicating a better fit when adjusted for education.
Section 2: Check for effect modification
I then checked if region and covid deaths per 100,000 were possible effect modifiers. When testing region with SVI, regionSouth (2.36e-06) and regionWest (9.62e-12) both produce statistically significant results.When testing region with CVAC, regionSouth (0.002982) and regionNortheast (0.006293) both produce statistically significant results. When testing region with predominant ethnicity, regionSouth and Ethnicity non-Hispanic American Indian/Alaska Native (0.003821), regionWest and Ethnicity non-Hispanic American Indian/Alaska Native (1.87e-08), and regionSouth and Ethnicity Hispanic (0.041065) produce statistically significant results. When testing covid death rates with SVI it produces statistically significant results (< 2e-16). When testing covid death rates with CVSC it produces statistically significant results (< 2e-16). When testing covid death rates with ethnicity, Ethnicitynon-Hispanic American Indian/Alaska Native (0.000126) and Ethnicity non-Hispanic Asian ( 0.003208) both produce statistically significant results (< 2e-16). Based on these p value findings I concluded that region and covid death rates were both effect modifiers.
Section 3 Table 3: Final Model
Call:
lm(formula = `Estimated hesitant` ~ +(`CVAC level of concern for vaccination rollout` *
Region) + (education), data = data)
Residuals:
Min 1Q Median 3Q Max
-13.4084 -1.8940 -0.1932 1.8329 14.9730
Coefficients:
Estimate
(Intercept) 30.359353
`CVAC level of concern for vaccination rollout` 2.699532
RegionNortheast -0.007013
RegionSouth 0.121749
RegionWest 7.845878
education -0.577217
`CVAC level of concern for vaccination rollout`:RegionNortheast 0.041016
`CVAC level of concern for vaccination rollout`:RegionSouth 2.177955
`CVAC level of concern for vaccination rollout`:RegionWest -11.516726
Std. Error
(Intercept) 0.493490
`CVAC level of concern for vaccination rollout` 0.408555
RegionNortheast 0.428486
RegionSouth 0.296376
RegionWest 0.449591
education 0.013917
`CVAC level of concern for vaccination rollout`:RegionNortheast 1.685704
`CVAC level of concern for vaccination rollout`:RegionSouth 0.549589
`CVAC level of concern for vaccination rollout`:RegionWest 0.864737
t value
(Intercept) 61.520
`CVAC level of concern for vaccination rollout` 6.608
RegionNortheast -0.016
RegionSouth 0.411
RegionWest 17.451
education -41.477
`CVAC level of concern for vaccination rollout`:RegionNortheast 0.024
`CVAC level of concern for vaccination rollout`:RegionSouth 3.963
`CVAC level of concern for vaccination rollout`:RegionWest -13.318
Pr(>|t|)
(Intercept) < 2e-16 ***
`CVAC level of concern for vaccination rollout` 4.65e-11 ***
RegionNortheast 0.987
RegionSouth 0.681
RegionWest < 2e-16 ***
education < 2e-16 ***
`CVAC level of concern for vaccination rollout`:RegionNortheast 0.981
`CVAC level of concern for vaccination rollout`:RegionSouth 7.59e-05 ***
`CVAC level of concern for vaccination rollout`:RegionWest < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.218 on 2853 degrees of freedom
Multiple R-squared: 0.5492, Adjusted R-squared: 0.548
F-statistic: 434.5 on 8 and 2853 DF, p-value: < 2.2e-16
To determine the best model I started with a baseline model including region and covid deaths as effect modifiers and education as a confounding variable (R-squared = .6772). I then used the stepAIC to help make the model more parsimonious. The stepAIC outputed Estimated hesitant ~ + (CVAC level of concern for vaccination rollout * covidDeathRateper100,000 * Region) + (Region * Predominant_Ethnicity) + (covidDeathRateper100,000 * education) with a r-squared value of .6771. While it was more parsimonious than the original model and had almost the same r-squared value I wanted to make it even more parsimonious.
The next model is Estimated hesitant ~ + (CVAC level of concern for vaccination rollout * covidDeathRateper100,000 * Region) + ( education) with a r squared value of .6314. This model is more parsimonious while maintaining as large a r-squared value as possible.
This model formula with coefficients
Estimated_hesitant = 25.227 + 0.208⋅SVI − 1.182⋅CVAC + 0.0216⋅DeathRate + 4.825⋅RegionNortheast − 5.031⋅RegionSouth + 28.553⋅RegionWest − 0.516⋅Education + 0.0454(CVAC⋅DeathRate) + 8.376⋅(CVAC⋅RegionNortheast) + 18.550⋅(CVAC⋅RegionSouth) − 33.204⋅(CVAC⋅RegionWest) + 0.0454⋅(CVAC⋅DeathRate) + 8.376⋅(CVAC⋅Region Northeast) + 18.550⋅(CVAC⋅Region South ) − 33.204⋅(CVAC⋅Region West ) − 0.0451⋅(DeathRate⋅RegionNortheast) + 0.0728⋅ (DeathRate⋅RegionSouth) − 0.2356⋅(DeathRate⋅RegionWest) − 0.0975⋅(CVAC⋅DeathRate⋅RegionNortheast) − 0.1867⋅(CVAC⋅DeathRate⋅RegionSouth) + 0.2515⋅(CVAC⋅DeathRate⋅RegionWest) − 0.0451⋅(DeathRate⋅Region Northeast) + 0.0728⋅(DeathRate⋅Region South ) − 0.2356⋅(DeathRate⋅Region West) − 0.0975⋅(CVAC⋅DeathRate⋅Region Northeast) − 0.1867⋅(CVAC⋅DeathRate⋅Region South) + 0.2515⋅(CVAC⋅DeathRate⋅Region West)
However, I thought this model to be a bit too overcomplicated when the effect modifiers were both used together so I kept working to find a simplier model. The model is Estimated hesitant ~ +(CVAC level of concern for vaccination rollout * Region) + (education). This model is much simplier an more parsimonious. However, it does loose some r-squared value and is now .5492.
This model formula with coefficients
Estimated hesitant = 30.359 + 2.700⋅CVAC −0.007RegionNortheast + 0.122RegionSouth + 7.846RegionWest −0.577Education + 0.041(CVAC⋅RegionNortheast) + 2.178(CVAC⋅RegionSouth) −11.517(CVAC⋅Region West)
The intercept(<2e-16), CVAC (4.65e-11), education(<2e-16), RegionWest (<2e-16), CVACxRegionSouth (7.59e-05) and CVACxRegionWest(< 2e-16) are all statistically significant. When all variables are 0, the expected hesitancy is 30.35. With each unit increase of CVAC, hesitancy is expected to increase by 2.70. The midwest is the baseline. If the region is the West, the hesitancy is expected to increase by 7.84. When CVAC and region south are together, the hesitancy is expected to increase by 2.18. When CVAC and region west are together hesitancy is expected to decrease by 11.51. CVAC, Region south, Region West, and CVAC*RegionSouth and CVACxRegionWest will increase the hesitancy rates. Region Northeast, education, and CVACxRegionWest will decrease hesitancy rates.
Goodness of Fit
To test the residuals I used autoplot. The residuals appear to have a slight curve indicating possible non linearity but looks mostly linear. The tails deviate on the normality graph indicating slight non normality in the residuals. When checking for homoscedascity it has a slight upward trend indicating some variance.
Conclusion
Overall, the graph could be stronger and a better predictor but that would mean giving up some parsimony. In the end my model had a pretty descent r-squared value of .5492. This means the model provides a reasonable explanation for vaccine hesitancy, explaining about 55% of its variability.
Interventions
To address vaccine hesitancy, interventions should focus on strategies targeting high-hesitancy states and regions. This can be done with local messaging, education, and community engagement. More vulnerable populations should be prioritized. One thing that could be improved upon is accessibility to clinics not just economically but physically with transportation. Combatting misinformation is essential to counter vaccine myths effectively. When working with diverse populations, interventions should involve collaborations with community leaders and culturally tailored messaging to build trust and address unique barriers faced by diverse groups. Culturally relevant interventions, like Es Tiempo, a campaign raises awareness of cervical cancer prevention among Latinas, has proven to be successful. More data collection and evaluation will help in sustaining vaccination rates across all communities.
Additional Tables and Insight
Figure 3: Average Hesitancy Rates by Social Vulnerability Index Very high vulnerability has the highest median estimated hesitancy (16.76%). Very low vulnerability has the lowest median rates of hesitancy (10.55%). This is interesting because you would think that the higher vulnerability would not be quite so hesitant.
Figure 4: Average Hesitancy Rates by CVAC Very high concern has the highest median estimated hesitancy (16.80%). Low concern has the lowest median rates of hesitancy (10.18%). This makes sense that areas where there is very high concern of vaccine rollout challenges could be high levels of hesitancy. For example, misinformation could be the cause of high levels of hesitancy and cause challenges to vaccine rollouts.
Figure 5: Average Hesitancy Rates by Ethnicity This graph displays the relationship between the percentage of an ethnicity in a population and the estimated hesitancy levels. Some groups like non-Hispanic Asians appear to have lower overall hesitancy, while groups such as non-Hispanic Black and non-Hispanic American Indian/Alaska Native show a wider spread and higher average hesitancy.